Search CORE

49 research outputs found

An Experimental Evaluation of Nearest Neighbour Time Series Classification

Author: Bagnall Anthony
Lines Jason
Publication venue
Publication date: 01/01/2014
Field of study

Data mining research into time series classification (TSC) has focussed on alternative distance measures for nearest neighbour classifiers. It is standard practice to use 1-NN with Euclidean or dynamic time warping (DTW) distance as a straw man for comparison. As part of a wider investigation into elastic distance measures for TSC~\cite{lines14elastic}, we perform a series of experiments to test whether this standard practice is valid. Specifically, we compare 1-NN classifiers with Euclidean and DTW distance to standard classifiers, examine whether the performance of 1-NN Euclidean approaches that of 1-NN DTW as the number of cases increases, assess whether there is any benefit of setting

k

for

k

-NN through cross validation whether it is worth setting the warping path for DTW through cross validation and finally is it better to use a window or weighting for DTW. Based on experiments on 77 problems, we conclude that 1-NN with Euclidean distance is fairly easy to beat but 1-NN with DTW is not, if window size is set through cross validation

arXiv.org e-Print Archive

CiteSeerX

University of East Anglia digital repository

Finding Motif Sets in Time Series

Author: Bagnall Anthony
Hills Jon
Lines Jason
Publication venue
Publication date: 01/01/2014
Field of study

Time-series motifs are representative subsequences that occur frequently in a time series; a motif set is the set of subsequences deemed to be instances of a given motif. We focus on finding motif sets. Our motivation is to detect motif sets in household electricity-usage profiles, representing repeated patterns of household usage. We propose three algorithms for finding motif sets. Two are greedy algorithms based on pairwise comparison, and the third uses a heuristic measure of set quality to find the motif set directly. We compare these algorithms on simulated datasets and on electricity-usage data. We show that Scan MK, the simplest way of using the best-matching pair to find motif sets, is less accurate on our synthetic data than Set Finder and Cluster MK, although the latter is very sensitive to parameter settings. We qualitatively analyse the outputs for the electricity-usage data and demonstrate that both Scan MK and Set Finder can discover useful motif sets in such data

arXiv.org e-Print Archive

University of East Anglia digital repository

Time Series classification through transformation and ensembles

Author: Lines Jason
Publication venue
Publication date: 01/02/2015
Field of study

The problem of time series classification (TSC), where we consider any real-valued ordered data a time series, offers a specific challenge. Unlike traditional classification problems, the ordering of attributes is often crucial for identifying discriminatory features between classes. TSC problems arise across a diverse range of domains, and this variety has meant that no single approach outperforms all others. The general consensus is that the benchmark for TSC is nearest neighbour (NN) classifiers using Euclidean distance or Dynamic Time Warping (DTW). Though conceptually simple, many have reported that NN classifiers are very diffi�cult to beat and new work is often compared to NN classifiers. The majority of approaches have focused on classification in the time domain, typically proposing alternative elastic similarity measures for NN classification. Other work has investigated more specialised approaches, such as building support vector machines on variable intervals and creating tree-based ensembles with summary measures. We wish to answer a specific research question: given a new TSC problem without any prior, specialised knowledge, what is the best way to approach the problem? Our thesis is that the best methodology is to first transform data into alternative representations where discriminatory features are more easily detected, and then build ensemble classifiers on each representation. In support of our thesis, we propose an elastic ensemble classifier that we believe is the first ever to significantly outperform DTW on the widely used UCR datasets. Next, we propose the shapelet-transform, a new data transformation that allows complex classifiers to be coupled with shapelets, which outperforms the original algorithm and is competitive with DTW. Finally, we combine these two works with with heterogeneous ensembles built on autocorrelation and spectral-transformed data to propose a collective of transformation-based ensembles (COTE). The results of COTE are, we believe, the best ever published on the UCR datasets

University of East Anglia digital repository

Classification of Household Devices by Electricity Usage Profiles

Author: Anderson Simon
Bagnall Anthony
Caiger-Smith Patrick
Lines Jason A.
Publication venue
Publication date: 01/01/2011
Field of study

Crossref

University of East Anglia digital repository

Time Series Classification with HIVE-COTE: The Hierarchical Vote Collective of Transformation-based Ensembles

Author: Bagnall Anthony
Lines Jason
Taylor Sarah
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/07/2018
Field of study

A recent experimental evaluation assessed 19 time series classification (TSC) algorithms and found that one was significantly more accurate than all others: the Flat Collective of Transformation-based Ensembles (Flat-COTE). Flat-COTE is an ensemble that combines 35 classifiers over four data representations. However, while comprehensive, the evaluation did not consider deep learning approaches. Convolutional neural networks (CNN) have seen a surge in popularity and are now state of the art in many fields and raises the question of whether CNNs could be equally transformative for TSC. We implement a benchmark CNN for TSC using a common structure and use results from a TSC-specific CNN from the literature. We compare both to Flat-COTE and find that the collective is significantly more accurate than both CNNs. These results are impressive, but Flat-COTE is not without deficiencies. We significantly improve the collective by proposing a new hierarchical structure with probabilistic voting, defining and including two novel ensemble classifiers built in existing feature spaces, and adding further modules to represent two additional transformation domains. The resulting classifier, the Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE), encapsulates classifiers built on five data representations. We demonstrate that HIVE-COTE is significantly more accurate than Flat-COTE (and all other TSC algorithms that we are aware of) over 100 resamples of 85 TSC problems and is the new state of the art for TSC. Further analysis is included through the introduction and evaluation of 3 new case studies and extensive experimentation on 1000 simulated datasets of 5 different types

University of East Anglia digital repository

HIVE-COTE: The hierarchical vote collective of transformation-based ensembles for time series classification:IEEE International Conference on Data Mining

Author: Bagnall Anthony
Lines Jason
Taylor Sarah
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/12/2016
Field of study

There have been many new algorithms proposed over the last five years for solving time series classification (TSC) problems. A recent experimental comparison of the leading TSC algorithms has demonstrated that one approach is significantly more accurate than all others over 85 datasets. That approach, the Flat Collective of Transformation-based Ensembles (Flat-COTE), achieves superior accuracy through combining predictions of 35 individual classifiers built on four representations of the data into a flat hierarchy. Outside of TSC, deep learning approaches such as convolutional neural networks (CNN) have seen a recent surge in popularity and are now state of the art in many fields. An obvious question is whether CNNs could be equally transformative in the field of TSC. To test this, we implement a common CNN structure and compare performance to Flat-COTE and a recently proposed time series-specific CNN implementation.We find that Flat-COTE is significantly more accurate than both deep learning approaches on 85 datasets. These results are impressive, but Flat-COTE is not without deficiencies. We improve the collective by adding new components and proposing a modular hierarchical structure with a probabilistic voting scheme that allows us to encapsulate the classifiers built on each transformation. We add two new modules representing dictionary and interval-based classifiers, and significantly improve upon the existing frequency domain classifiers with a novel spectral ensemble. The resulting classifier, the Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is significantly more accurate than Flat-COTE and represents a new state of the art for TSC. HIVE-COTE captures more sources of possible discriminatory features in time series and has a more modular, intuitive structure

University of East Anglia digital repository

A probabilistic classifier ensemble weighting scheme based on cross-validated accuracy estimates

Author: Large James
Lines Jason
Bagnall Anthony
Publication venue
Publication date: 01/11/2019
Field of study

Our hypothesis is that building ensembles of small sets of strong classifiers constructed with different learning algorithms is, on average, the best approach to classification for real world problems. We propose a simple mechanism for building small heterogeneous ensembles based on exponentially weighting the probability estimates of the base classifiers with an estimate of the accuracy formed through cross-validation on the train data. We demonstrate through extensive experimentation that, given the same small set of base classifiers, this method has measurable benefits over commonly used alternative weighting, selection or meta classifier approaches to heterogeneous ensembles. We also show how an ensemble of five well known, fast classifiers can produce an ensemble that is not significantly worse than large homogeneous ensembles and tuned individual classifiers on datasets from the UCI archive. We provide evidence that the performance of the Cross-validation Accuracy Weighted Probabilistic Ensemble (CAWPE) generalises to a completely separate set of datasets, the UCR time series classification archive, and we also demonstrate that our ensemble technique can significantly improve the state-of-the-art classifier for this problem domain. We investigate the performance in more detail, and find that the improvement is most marked in problems with smaller train sets. We perform a sensitivity analysis and an ablation study to demonstrate the robustness of the ensemble and the significant contribution of each design element of the classifier. We conclude that it is, on average, better to ensemble strong classifiers with a weighting scheme rather than perform extensive tuning and that CAWPE is a sensible starting point for combining classifiers

University of East Anglia digital repository

“The Support Continuum” Exploring how support workers understand their role in supporting adults with learning disabilities to use the internet for personal and sexual relationships Jason Lines

Author: LINES Jason
Publication venue
Publication date: 01/08/2019
Field of study

With internet use prominent in daily life, research investigating how adults with learning disabilities are accessing and using the internet is increasingly relevant. Three papers are presented in this thesis which aimed to provide additional understanding about this research topic. The first paper outlines a review of the literature regarding what factors influence how adults with intellectual disabilities access and use the internet. The existing literature suggests a shift in the technology used to access the internet, from computers to smartphones. It also shows a shift in the purpose of internet use, from only using the internet for emails, to multi-platform usage, mainly social media. Significantly, it highlighted how important it is for some adults with learning disabilities to have access to support to assist with using the internet; in addition, the perceptions of those supporting impacted on how much support a person would receive. The second paper details the empirical research that was undertaken in response to findings from the literature review. Eight support workers took part in this qualitative study which looked at how support workers understand their role supporting adults with learning disabilities to use the internet for personal and sexual relationships. Interviews were transcribed and analysed using thematic analysis. The themes of ‘Social and Organisational dilemmas’ with subthemes ‘Role and Moral positioning’, ‘Expectations of Support’ and ‘Protected and Reflective space; ‘Policy dilemmas’ and ‘Power and position’ were found and discussed. This research highlighted the current gap in training and guidance available for support workers regarding supporting people to use the internet for personal and sexual relationships – suggesting more must be done to develop these training opportunities. The final paper is an executive summary which condenses the empirical research and presents it in a format accessible to adults with learning disabilities, support workers, and organisations employing support workers

STORE - Staffordshire Online Repository